Data migration with placement based on access patterns

ABSTRACT

A method, executed by a computer, includes determining an access metric, an input/output operations per second (IOPs) metric, and a size metric for each data target of a plurality of data targets, ranking the plurality of data targets according to the access metric of each data target, assigning each data target to a storage pool of a plurality of storage pools according to the IOPs metric and the size metric of the data target. A computer system and computer program product corresponding to the above method are also disclosed herein.

BACKGROUND OF THE INVENTION

The present invention relates generally to the migration of data, andmore specifically, to the intelligent placement of data during migrationbased on access patterns.

In the field of data migration, source and destination systems may havedifferent storage tiers and file placement policies for those tiers. Forexample, a source system may have storage tiers such as flash, disk,tape, and external cloud tiers, whereas a destination system may onlyhave tape storage. During a migration, data may be moved multiple times:first, data may be moved from the source system to the destinationsystem, and then, the data may be rearranged across the various storagetiers of the destination system in an effort to optimize its placement.Thus, data migration can be made more efficient by optimizing theinitial placement of data on a destination system's storage tiers.

SUMMARY

As disclosed herein, a method, executed by a computer, includesdetermining an access metric, an input/output operations per second(IOPs) metric, and a size metric for each data target of a plurality ofdata targets, ranking the plurality of data targets according to theaccess metric of each data target, assigning each data target to astorage pool of a plurality of storage pools according to the IOPsmetric and the size metric of the data target. A computer system andcomputer program product corresponding to the above method are alsodisclosed herein.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram depicting an example of a migrationenvironment in accordance with embodiments of the present invention;

FIG. 2 is a flow chart depicting an example of a target access analysismethod in accordance with embodiments of the present invention;

FIG. 3 is a table depicting an example of target records in accordancewith embodiments of the present invention;

FIG. 4 is a flow chart depicting an example of a target placement methodin accordance with embodiments of the present invention;

FIG. 5 is a table depicting an example of a pool configuration table inaccordance with embodiments of the present invention;

FIG. 6 is a table depicting an example of a pool assignment table inaccordance with embodiments of the present invention; and

FIG. 7 is a block diagram depicting one example of a computing apparatus(i.e., computer) suitable for executing the methods disclosed herein.

DETAILED DESCRIPTION

Embodiments of the present invention relate generally to the migrationof data, and more specifically, to the intelligent placement of dataduring migration based on the data's access patterns. By analyzingaccess patterns of data, individual data files or objects can beclassified as “hotter” or “colder” than other files based on theirrelative access frequencies. For example, a file that is accessed fourtimes per hour is hotter than a file on the same system that was onlyaccessed once per hour.

By analyzing the audit logs of data being migrated, it is possible todetermine, in terms of how hot or cold the data is, where to place dataon the destination system. In particular, hotter data may be placed on a“hot” storage pool (such as flash or disk), and colder data may beplaced on a “cold” storage pool such as tape. Thus, data files/objectscan be migrated directly to the correct storage tier of a target system,thereby eliminating intermediate data movements. Furthermore,embodiments of the present invention use audit logs to discern the heatinformation about the source data, and thus do not require the additionof a network device in the data migration path.

It should be noted that references throughout this specification tofeatures, advantages, or similar language herein do not imply that allof the features and advantages that may be realized with the embodimentsdisclosed herein should be, or are in, any single embodiment of theinvention. Rather, language referring to the features and advantages isunderstood to mean that a specific feature, advantage, or characteristicdescribed in connection with an embodiment is included in at least oneembodiment of the present invention. Thus, discussion of the features,advantages, and similar language, throughout this specification may, butdo not necessarily, refer to the same embodiment.

Furthermore, the described features, advantages, and characteristics ofthe invention may be combined in any suitable manner in one or moreembodiments. One skilled in the relevant art will recognize that theinvention may be practiced without one or more of the specific featuresor advantages of a particular embodiment. In other instances, additionalfeatures and advantages may be recognized in certain embodiments thatmay not be present in all embodiments of the invention.

These features and advantages will become more fully apparent from thefollowing drawings, description and appended claims, or may be learnedby the practice of the invention as set forth hereinafter. The presentinvention will now be described in detail with reference to the figures.

FIG. 1 is a block diagram depicting an example of a migrationenvironment 100 in accordance with embodiments of the present invention.As depicted, migration environment 100 includes source 110 with targets120A-120C and audit log 130, scanning module 140, target records 150,destination 160, and storage pools 170A-170C. Migration environment 100enables the migration of targets 120A-120C from source 110 todestination 160, where the targets are intelligently placed on storagepools 170A-170C.

Source 110 may be any system that stores data. Source 110 may includeone or more of any non-volatile storage media known in the art. Forexample, source 110 can be implemented with a tape library, opticallibrary, one or more independent hard disk drives, or multiple hard diskdrives in a redundant array of independent disks (RAID). Similarly, dataon source 110 may conform to any suitable storage architecture known inthe art, such as a file, a relational database, an object-orienteddatabase, and/or one or more tables.

Targets 120A-120C may refer to data stored on source 110. Targets120A-120C may include files, directories, objects, containers, orcombination thereof; each target specifically refers to either a file,director, object, and/or container stored on source 110. Targets120A-120C are any file, directory, object, or container that is targetedfor migration from source 110 to destination 160. Three targets120A-120C are shown for exemplary purposes; there may be greater orfewer targets on source 110. Targets 120A-120C may be stored together ina single storage pool on source 110, or across multiple storage pools.

Audit log 130 may include any log containing records of the accessfrequency of targets 120A-120C. Audit log 130 may record the time atwhich any read or write operations interacted with a specific target.For example, audit log 130 may have a record of target 120A beingaccessed three times, target 120B being accessed once, and target 120Cnot being accessed at all over a particular time span. In someembodiments, audit log 130 is routinely generated or updated by source110 whenever a target is created, written to, or read from.

Scanning module 140 may analyze and determine the access frequencies oftargets 120A-120C and generate/maintain target records 150. In someembodiments, scanning module 140 performs target access counting method200 (see below). Scanning module 140 may analyze audit log 130 in orderto determine the relative access frequencies of targets 120A-120C.Scanning module 140 may output its analysis as target records 150.Scanning module 140 may also receive storage pool information fromdestination 160, including the storage size, input/output operations persecond (IOPs), and configuration information of each storage pool. Insome embodiments, scanning module 140 assigns each target in targetrecords 150 to a particular storage pool.

Target records 150 may be a list of records that is created andmaintained by scanning module 140. Target records 150 may track eachtarget, along with the number of access events of a target, the IOPsassociated with a target, and the data size of a target. In someembodiments, target records 150 is updated by scanning module 140 toindicate the storage pool on destination 160 to which a target isassigned.

Destination 160 may be any system that stores data, similar to source110. Destination 160 may include one or more storage pools that act asthe ultimate storage destination of targets 120A-120C that are beingmigrated. During a migration, targets 120A-120C are moved from source110 to destination 160. Destination may also receive instructionsregarding which storage pool to place targets.

Storage pools 170A-170C may represent the tiered storage destination formigrated targets. For example, storage pool 170A may be a “gold”performance tier that is capable of high IOPs (such as flash memory),and is ideal for hotter files that are accessed more frequently. Storagepool 170B may be an intermediate tier, and may utilize disk storage.Storage pool 170C may be a lower performance tier that is suited forcolder files, and may include tape storage. Thus, each storage pool170A-170C may correlate to a particular tier upon which data may bestored according to its size and frequency-of-access requirements. Whilethree storage pools 170A-170C are depicted for exemplary purposes,destination 160 may have any number of storage pools, each representinga particular performance tier of storage. The performance specificationsof each storage pool 170A-170C may be configurable by a user and/orderived from hardware specifications of the physical storage mediatherein.

Network 180 can be, for example, a local area network (LAN), a wide areanetwork (WAN) such as the Internet, or a combination of the two, andinclude wired, wireless, or fiber optic connections. In general, network180 can be any combination of connections and protocols that willsupport communications between source 110, scanning module 140, and/ordestination 160 in accordance with an embodiment of the presentinvention.

FIG. 2 is a flow chart depicting an example of a target access countingmethod 200 in accordance with embodiments of the present invention. Asdepicted, target access counting method 200 includes scanning (210) anaudit log, determining (220) whether a record exists, incrementing (230)the count, adding (240) a new record, and determining (250) whetherthere are additional targets. Target access counting method 200 analyzesaudit log 130 to determine the relative access frequencies of datatargets on source 110.

Scanning (210) the audit log may include using scanning module 140 toscan through entries on audit log 130 regarding targets 120A-120C. Eachentry may contain a data target (either a file, directory, object, orcontainer) along with a timestamp of when it was accessed.

Determining (220) whether a record exists includes determining whetherscanning module 140 has a record for an entry in audit log 130. Thus,scanning module 140 may check to see if there is a record for eachtarget that is listed in audit log 130. In some embodiments, scanningmodule 140 stores records in target records 150. If scanning module 140determines that there is already a record in target records 150 for anentry in audit log 130, then target access counting method 200 mayproceed to incrementing (230) the count; if there is not a record, thentarget access counting method 200 may proceed to adding (240) a newrecord.

Incrementing (230) the count may include incrementing a count associatedwith a target in target records 150. Scanning module 140 may incrementthe count every time the access of a target is listed in audit log 130,so the count may correspond to the number of access events of a target.Adding (240) a new record to target records 150 occurs if there is notalready a record corresponding to a target. When a new record iscreated, the count may be set to one.

Determining (250) whether there are additional targets may includeproceeding to scan audit log 130 to determine if more target accessevents are listed. If there are no more target access events, thentarget access counting method 200 may terminate; if additional targetaccess events exist, then method 200 loops back to determining (220)whether a record exists for that target. In this manner, target accesscounting method 200 processes audit log 130, one entry at a time,counting the number of access events of each target, and storing thecount as an entry in target records 150.

FIG. 3 is a table depicting an example of a target records 300 inaccordance with embodiments of the present invention. Target records 300may include a list of target names 310, along with number of accesses320, input/output operations per second 330, and size 340 of eachtarget. Target records 300 may be an example of the output of scanningmodule 140 according to target access counting method 200.

Target names 310 may include a listing of each target, such as targets130A-130C, that appears in audit log 130 and/or is processed by scanningmodule 140. Number of accesses 320 may correspond to the number of timesthat the corresponding target was accessed, as determined by targetaccess counting method 200. Input/output operations per second (IOPs)330 may be calculated by dividing the number of accesses of a particulartarget by the amount of time over which audit log 130 captured data,converted to operations per second. Thus, IOPs 330 represents anormalized number of input/output operations. Size 340 may include thesize of the target as measured in units of digital information such asbytes.

After scanning module 140 creates target records 300, target records 300may be sorted first according to the number of accesses 320 (and thus,IOPs 330) in descending order. Target records 300 may further be sortedby size 340 in descending order. For example, in the depicted example,File6 and File7 are tied for number of accesses, so they are furthersorted descendingly by size.

FIG. 4 is a flow chart depicting an example of a target placement method400 in accordance with embodiments of the present invention. Asdepicted, target placement method 400 includes determining (410) whetherIOPs threshold is met, determining (420) whether capacity threshold ismet, updating (430) the list with pool assignment, determining (440)whether there are more targets, determining (450) whether there are morepools, and updating (460) the list with no pool assignment. Targetplacement method 400 takes into account the IOPs and size of each targetand assigns it to a proper storage pool based on the storage pool'sspecifications. Target placement method 400 assigns targets to storagepools by processing target records 300 in a descending order, andattempting to pair each target with the highest-rated pool that isavailable.

Determining (410) whether the IOPs threshold is met may includecomparing the IOPs 330 of a target to the IOPs threshold of a storagepool. The IOPs threshold of a storage pool may be some predeterminedfraction of the maximum IOPs for which it is rated or capable. In someembodiments, determination operation 410 takes into account the IOPs fora storage pool that is already consumed, and factor that into the IOPsor IOPs threshold that is available for the target currently beingplaced by target placement method 400. If it is determined that the IOPsof the target meets the IOPs threshold, target placement method 400proceeds to determination operation 420. If not, method 400 proceeds tooperation 450.

Determining (420) whether the storage capacity threshold is met mayinclude comparing the size 340 of a target to the storage capacitythreshold of a storage pool. The storage capacity threshold of a storagepool may be some predetermined fraction of the maximum storage volume ofthe pool. In some embodiments, determination operation 420 takes intoaccount the storage consumed by other targets placed on a storage pool,and factor that into the storage capacity or storage capacity thresholdthat is available for the target currently being placed by targetplacement method 400. In some embodiments, target placement method 400bypasses the IOPs and storage threshold and instead uses the full IOPsand storage that is available. If it is determined that the size of thetarget meets the storage threshold, target placement method 400 proceedsto updating operation 430. If not, method 400 proceeds to operation 450.

Updating (430) the list with a pool assignment may include updating atarget's records with its pool assignment, which is the storage poolthat the target's requirements were compared against in determinationoperations 410 and 420. In some embodiments, the pool assignment isadded to target records 300. Once targets are assigned correct pools,migration may take place from source 110 to destination 160.

Determining (440) whether there are more targets may include proceedingdown target records 300 to the next target. If there is another target,target placement method 400 loops back to determination operation 410.If there are no more targets, method 400 may terminate.

When a target does not meet the IOPs threshold or capacity threshold ofa storage pool, target placement method 400 proceeds to determining(450) whether there are additional pools available. Destination 160 mayreport which storage pools, such as pools 170A-170C, are available.Target placement method 400 may proceed to the next highest-ratedstorage pool (see description of FIG. 5 for additional information). Ifthere is at least one available storage pool, target placement method400 loops back to determination operation 410. If there are no morepools available, method 400 proceeds to operation 460.

Updating (460) the list with no pool assignment may include updatingtarget records 300 to reflect that the target was not able to be placedinto a storage pool. When a target is unable to be placed into a storagepool, it may be placed into a default or predetermined storage locationduring the migration process.

FIG. 5 is a table depicting an example of a pool configuration table 500in accordance with embodiments of the present invention. Poolconfiguration table 500 includes a listing of storage pools 510, alongwith the corresponding size 520, IOPs 530, size threshold 540, and IOPsthreshold 550 of each storage pool. Pool configuration table 500 may beused by target placement method 400 in order to select a pool for atarget.

Storage pools 510 may include a listing of pools by name or class indescending order. In some embodiments, the highest-rated pool is a“platinum” tier, followed by “gold,” “silver,” and “bronze.” Pool size520 may list the storage volume size of each pool, and IOPs 530 may listthe IOPs that a pool is capable of delivering. Size threshold 540 may bea predetermined or selected fraction of the maximum size capacity, andsimilarly, IOPs threshold 550 may be a fraction of the maximum IOPscapability. A threshold may be used in order to ensure that thedestination system's resources are never 100% utilized. In someembodiments, pool size 520, IOPs 530, size threshold 540, and/or IOPsthreshold are updated to reflect the currently available storage andIOPs.

FIG. 6 is a table depicting an example of a pool assignment table 600 inaccordance with embodiments of the present invention. As depicted, poolassignment table 600 includes information from target records table 300such as target name 310, number of accesses 320, IOPs 330, and size 340.Pool assignment table 600 further includes storage pool assignment 610.In some embodiments, pool assignment table 600 is target records table300 with an updated pool assignment 610 field. Pool assignment table 600may be generated by target placement method 400, and may be used duringmigration to determine the destination pool of a target.

FIG. 7 is a block diagram depicting components of a computer 700suitable for executing the methods disclosed herein. It should beappreciated that FIG. 7 provides only an illustration of one embodimentand does not imply any limitations with regard to the environments inwhich different embodiments may be implemented. Many modifications tothe depicted environment may be made.

As depicted, the computer 700 includes communications fabric 702, whichprovides communications between computer processor(s) 704, memory 706,persistent storage 708, communications unit 712, and input/output (I/O)interface(s) 714. Communications fabric 702 can be implemented with anyarchitecture designed for passing data and/or control informationbetween processors (such as microprocessors, communications and networkprocessors, etc.), system memory, peripheral devices, and any otherhardware components within a system. For example, communications fabric702 can be implemented with one or more buses.

Memory 706 and persistent storage 708 are computer readable storagemedia. In the depicted embodiment, memory 706 includes random accessmemory (RAM) 716 and cache memory 718. In general, memory 706 caninclude any suitable volatile or non-volatile computer readable storagemedia.

One or more programs may be stored in persistent storage 708 forexecution by one or more of the respective computer processors 704 viaone or more memories of memory 706. The persistent storage 708 may be amagnetic hard disk drive, a solid state hard drive, a semiconductorstorage device, read-only memory (ROM), erasable programmable read-onlymemory (EPROM), flash memory, or any other computer readable storagemedia that is capable of storing program instructions or digitalinformation.

The media used by persistent storage 708 may also be removable. Forexample, a removable hard drive may be used for persistent storage 708.Other examples include optical and magnetic disks, thumb drives, andsmart cards that are inserted into a drive for transfer onto anothercomputer readable storage medium that is also part of persistent storage708.

Communications unit 712, in these examples, provides for communicationswith other data processing systems or devices. In these examples,communications unit 712 includes one or more network interface cards.Communications unit 712 may provide communications through the use ofeither or both physical and wireless communications links.

I/O interface(s) 714 allows for input and output of data with otherdevices that may be connected to computer 700. For example, I/Ointerface 714 may provide a connection to external devices 720 such as akeyboard, keypad, a touch screen, and/or some other suitable inputdevice. External devices 720 can also include portable computer readablestorage media such as, for example, thumb drives, portable optical ormagnetic disks, and memory cards.

Software and data used to practice embodiments of the present inventioncan be stored on such portable computer readable storage media and canbe loaded onto persistent storage 708 via I/O interface(s) 714. I/Ointerface(s) 714 may also connect to a display 722. Display 722 providesa mechanism to display data to a user and may be, for example, acomputer monitor.

The programs described herein are identified based upon the applicationfor which they are implemented in a specific embodiment of theinvention. However, it should be appreciated that any particular programnomenclature herein is used merely for convenience, and thus theinvention should not be limited to use solely in any specificapplication identified and/or implied by such nomenclature.

The embodiments disclosed herein include a system, a method, and/or acomputer program product. The computer program product may include acomputer readable storage medium (or media) having computer readableprogram instructions thereon for causing a processor to carry out themethods disclosed herein.

The computer readable storage medium can be a tangible device that canretain and store instructions for use by an instruction executiondevice. The computer readable storage medium may be, for example, but isnot limited to, an electronic storage device, a magnetic storage device,an optical storage device, an electromagnetic storage device, asemiconductor storage device, or any suitable combination of theforegoing. A non-exhaustive list of more specific examples of thecomputer readable storage medium includes the following: a portablecomputer diskette, a hard disk, a random access memory (RAM), aread-only memory (ROM), an erasable programmable read-only memory (EPROMor Flash memory), a static random access memory (SRAM), a portablecompact disc read-only memory (CD-ROM), a digital versatile disk (DVD),a memory stick, a floppy disk, a mechanically encoded device such aspunch-cards or raised structures in a groove having instructionsrecorded thereon, and any suitable combination of the foregoing. Acomputer readable storage medium, as used herein, is not to be construedas being transitory signals per se, such as radio waves or other freelypropagating electromagnetic waves, electromagnetic waves propagatingthrough a waveguide or other transmission media (e.g., light pulsespassing through a fiber-optic cable), or electrical signals transmittedthrough a wire.

Computer readable program instructions described herein can bedownloaded to respective computing/processing devices from a computerreadable storage medium or to an external computer or external storagedevice via a network, for example, the Internet, a local area network, awide area network and/or a wireless network. The network may comprisecopper transmission cables, optical transmission fibers, wirelesstransmission, routers, firewalls, switches, gateway computers and/oredge servers. A network adapter card or network interface in eachcomputing/processing device receives computer readable programinstructions from the network and forwards the computer readable programinstructions for storage in a computer readable storage medium withinthe respective computing/processing device.

Computer readable program instructions for carrying out operations ofthe present invention may be assembler instructions,instruction-set-architecture (ISA) instructions, machine instructions,machine dependent instructions, microcode, firmware instructions,state-setting data, or either source code or object code written in anycombination of one or more programming languages, including an objectoriented programming language such as Smalltalk, C++ or the like, andconventional procedural programming languages, such as the “C”programming language or similar programming languages. The computerreadable program instructions may execute entirely on the user'scomputer, partly on the user's computer, as a stand-alone softwarepackage, partly on the user's computer and partly on a remote computeror entirely on the remote computer or server. In the latter scenario,the remote computer may be connected to the user's computer through anytype of network, including a local area network (LAN) or a wide areanetwork (WAN), or the connection may be made to an external computer(for example, through the Internet using an Internet Service Provider).In some embodiments, electronic circuitry including, for example,programmable logic circuitry, field-programmable gate arrays (FPGA), orprogrammable logic arrays (PLA) may execute the computer readableprogram instructions by utilizing state information of the computerreadable program instructions to personalize the electronic circuitry,in order to perform aspects of the present invention.

Aspects of the present invention are described herein with reference toflowchart illustrations and/or block diagrams of methods, apparatus(systems), and computer program products according to embodiments of theinvention. It will be understood that each block of the flowchartillustrations and/or block diagrams, and combinations of blocks in theflowchart illustrations and/or block diagrams, can be implemented bycomputer readable program instructions.

These computer readable program instructions may be provided to aprocessor of a general purpose computer, special purpose computer, orother programmable data processing apparatus to produce a machine, suchthat the instructions, which execute via the processor of the computeror other programmable data processing apparatus, create means forimplementing the functions/acts specified in the flowchart and/or blockdiagram block or blocks. These computer readable program instructionsmay also be stored in a computer readable storage medium that can directa computer, a programmable data processing apparatus, and/or otherdevices to function in a particular manner, such that the computerreadable storage medium having instructions stored therein comprises anarticle of manufacture including instructions which implement aspects ofthe function/act specified in the flowchart and/or block diagram blockor blocks.

The computer readable program instructions may also be loaded onto acomputer, other programmable data processing apparatus, or other deviceto cause a series of operational steps to be performed on the computer,other programmable apparatus or other device to produce a computerimplemented process, such that the instructions which execute on thecomputer, other programmable apparatus, or other device implement thefunctions/acts specified in the flowchart and/or block diagram block orblocks.

The flowcharts and block diagrams in the Figures illustrate thearchitecture, functionality, and operation of possible implementationsof systems, methods, and computer program products according to variousembodiments of the present invention. In this regard, each block in theflowchart or block diagrams may represent a module, segment, or portionof instructions, which comprises one or more executable instructions forimplementing the specified logical function(s). In some alternativeimplementations, the functions noted in the block may occur out of theorder noted in the figures. For example, two blocks shown in successionmay, in fact, be executed substantially concurrently, or the blocks maysometimes be executed in the reverse order, depending upon thefunctionality involved. It will also be noted that each block of theblock diagrams and/or flowchart illustration, and combinations of blocksin the block diagrams and/or flowchart illustration, can be implementedby special purpose hardware-based systems that perform the specifiedfunctions or acts or carry out combinations of special purpose hardwareand computer instructions.

What is claimed is:
 1. A method, executed by a computer, the methodcomprising: determining an access metric, an input/output operations persecond (IOPs) metric, and a size metric for each data target of aplurality of data targets, wherein each data target of the plurality oftargets is selected from the list consisting of a file, a directory, anobject and a container, and wherein the access metric of a data targetis calculated by analyzing an audit log corresponding to the datatarget; ranking the plurality of data targets according to the accessmetric and further according to the size metric of each data target;assigning each data target to a storage pool of a plurality of storagepools according to the IOPs metric and the size metric of the datatarget, wherein the plurality of data targets are assigned to theplurality of storage pools in an order corresponding to the ranking ofthe plurality of data targets, and wherein assigning each data target toa storage pool of a plurality of storage pools comprises selecting astorage pool according to one or more of a pool storage metric, a poolstorage threshold metric, a pool IOPs metric, and a pool IOPs thresholdmetric; and copying each data target to its assigned storage pool.